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SELECTION MARKERS USEFUL FOR HETEROLOGOUS PROTEIN EXPRESSION 
All documents cited herein are incorporated by reference in their entirety. 

TECHNICAL FIELD 

This invention is in the field of the recombinant expression of proteins in heterologous hosts. 

5 BACKGROUND ART 

Recombinant expression of proteins is of huge importance. For convenience, bacterial hosts such as 
E.coli are t>'pica]Iy used. Where bacterial hosts are unsuitable (e.g. where protein glycosylation or 
other modifications are desired, or where proteins are not expressed for one reason or another) it is 
common to choose a yeast host, a baculovirus host, or perhaps a cell line derived from a higher 
10 eukaryote, such as a CHO cell line. Plants are also used as recombinant expression hosts. 

Although recombinant protein expression is often routine, with off-the-shelf kits being available for 
general use, many proteins cannot easily' be expressed in this way. Bacterial hosts often give 
insoluble proteins which must be purified and re- folded from inclusion bodies, and do not offer 
eukaryotic post translational modifications. Yeasts (including Saccharomyces) grow poorly when 

15 minimal media are required by the selection systems that are commonly used, and Pichia systems [1 1 
are generally useful only for secreted proteins. The baculovirus and CHO systems are cumbersome 
and expensive, and do not store well by freezing. Plant systems are at an early stage and extensive 
post-expression processing is required. Moreover, transformed hosts are typically unstable such that 
it is constantly necessary to impose selective conditions to prevent reversion to a non-transformed 

20 state e.g. by loss of expression plasmids, etc. For these reasons, hosts such as Saccharomyces are 
seen as poor choices for general recombinant expression. 

Thus there remains a need for an expression system which avoids the need for expensive reagents, 
which is genetically stable, which can be frozen well, which can grow quickly and abundantly, and 
which can produce eukaryotic proteins in a soluble and active form. It is an object of the invention to 
25 provide an improved expression system to address these needs. 

DISCLOSURE OF THE INVENTION 

The invention is based on the use of a new class of selection marker in expression vectors. 

Selection markers used in prior art systems are often based on including a resistance gene in the 
vector e.g. an antibiotic resistance gene (e.g. ampicillin resistance, ampR)^ a drug resistance gene 
30 {e.g. neomycin resistance), a herbicide resistance gene {e.g. glyphosate resistance), the HPRT/HAT 
system, etc. When used with a host that is naturally sensitive to the factor in question, the resistance 
genes mean that only transformed cells can survive in a medium containing the factor. 
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Other selection markers are based on auxotrophic hosts i.e. those which require a particular factor in 
order to survive. Auxotrophic host systems are by far the most commonly used for j'easts [2], usually 
using URA3 (for uracil auxotrophs), LEU2 (for leucine auxotrophs), TRPl (for tryptophan 
auxotrophs) or HIS3 (for histidine auxotrophs) to complement the mutations in the auxotrophic host 
and confer prototrophy. The hosts can grow in rich medium, but growth in a medium lacking an 
essential factor {e.g. lacking leucine) leads to cell death. Inclusion of a survival gene (e.g. the 
2-isopropyl malate dehydrogenase encoded by LEU2) on a plasm id ensures that growth in the 
appropriate minimal medium selects only transforraants. On transfer to a rich medium, where 
selection pressure is absent, auxotrophic hosts tend to lose plasmids encoding the selection markers. 

These prior art selection systems are based on using a growth medium in which only transformants 
can survive, either by including the lethal factor (transfoiTnants are resistant) or by omitting the 
essential factor (transformants are not auxotrophic). The markers are thus conditional, as the 
selection pressure applies only under certain conditions. In contrast, the selection markers used 
according to the present invention are non-conditional i.e. the selection pressure is absolute. The 
markers involved are genes which encode essential survival factors, and loss of the marker gene {e.g. 
by loss of the expression vector) is lethal. By avoiding resistance markers, lethal factors {e.g. 
antibiotics) do not have to be added to culture media, thus simplifying the culture process, reducing 
costs and avoiding contamination of the expressed protein. By avoiding auxotrophic hosts, cells can 
be grovi^n in rich media rather than in minimal media, thereby giving much better growth rates. 

Thus the invention provides a cell that expresses both chromosomal genes and extra-chromosomal 
genes, wherein (a) the expressed extra-chromosomal genes include a gene with an essential function, 
the expression of which is unconditionally required for survival of the cell, (b) the expressed 
chromosomal genes do not provide that essential function, and (c) the extra-chromosomal genes 
include a heterologous gene, the expression of which is controlled by a promoter that is functional in 
the cell. Loss of the extra-chromosomal essential gene is lethal to the cell. 

The invention also provides a method for expressing a heterologous gene, comprising the step of- 
growing a cell of the invention in a culture medium. The invention also provides a method for 
purifying a protein, comprising the steps of: (a) growing a cell of the invention such that it expresses 
said protein; and (b) purifying the protein. The method may involve the step of: (c) treating the 
protein with a protease to provide a cleavage product of interest, and this step (c) may follow step (b) 
or may be an intrinsic part of step (b). 

The cell of the invention can be constructed in two steps, as illustrated for yeast in Figure 6 and as 
described below. The invention uses a starting cell that expresses both chromosomal genes and 
extra-chromosomal genes, wherein (a) the expressed extra-cliromosomal genes include a gene with 
an essential function, the expression of which is unconditionally required for survival of the cell, 
(b) the expressed chromosomal genes do not provide that essential function, and (c) the 
extra-chromosomal genes include a conditionally-lethal gene. 
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The invention also provides an intermediate ceil which expresses chromosomaJ genes, a first set of 
extra-chromosomal genes and a second set of extra-chromosomai genes, wherein (a) the expressed 
first and second sets of extra-chromosomal genes both include a gene with the same essential 
ftinction, the expression of which is unconditionally required for survival of the cell, (b) the 
5 expressed chromosomal genes do not provide that essential function, (c) the first set of 
extra-chromosomal genes includes a conditionally-lethal gene, and (d) the second set of 
extra-chromosomal genes includes bodi a conditionally-required gene and a heterologous gene. 

The invention also provides an extra-chromosomal vector, comprising: (a) an essential gene whose 
expression is unconditionally required for survival of a cell of interest; (b) a conditionally-required 
10 gene to allow selection of host cells which include the extra-chromosomal vector; and (c) a gene 
encoding a heterologous protein of interest operably linked to a promoter that is functional in the cell 
of interest. 

The invention also provides a method for preparing a cell of the invention, comprising the steps of: 
(a) obtaining a starting cell, which expresses a conditionally- lethal gene: (b) transforming tlie starting 
1 5 cell with an extra-chromosomal vector of the invention; (c) selecting transformants which express the 
vector's conditionally-required gene; and then (d) selecting transformants which lose the 
conditionally-lethal gene. 

The invention alternatively provides a cell which expresses chromosomal genes and 
extra-chromosomal genes, wherein (a) tlie expressed extra-chromosomal genes include an essential 
20 gene whose expression is unconditionally required for survival of the cell, (b) the expressed 
chromosomal genes do not include said essentia! gene, and (c) the extra-chromosomal genes include 
a heterologous gene, the expression of which is controlled by a promoter that is functional in the cell. 

Essential genes 

The invention is based on the use of genes with essential functions as selection markers. Vectors 
25 encoding heterologous products of interest also encode the essential gene. As loss of the essential 
function is unconditionally lethal, the selection pressure for cells which contain the vector is absolute 
Le. surviving cells must contain the vector with both the essential gene and the heterologous gene. 

The essential gene can be any gene whose loss prevents the giowth of cells e.g. the loss prevents cell 
division, prevents mitosis, prevents transcription, prevents translation, or prevents any other 
30 metabolic process which is essential for survival in culture. A gene is not an "essential gene" if its 
expression is required for survival only under certain conditions e.g. ampR is essential in the 
presence of ampicillin, but it is not essential under other circumstances, and so ampR is not an 
"essential gene" — its loss is not unconditionally lethal, as a change in growth conditions cannot 
compensate for the loss of an "essential gene". 

35 The identification of essential genes is straightforward e.g. using knockout studies, etc. Keference 3 

lists various essential genes in E.coli^ including some which are only conditionally-lethal, and the 
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profile of the E,coli chromosome in reference 4 classifies genes as non-essentia! or essential. 
Reference 5 lists various essential genes for yeast, and the EUROSCARF [6] and EUROFAN [7,8] 
projects have also identified essential genes in yeast. EUROFAN defines an essential gene as one 
which is "imperative for the vegetative life cycle of a yeast ceil grown on rich YPD media at 30^C, 
5 and estimated that 16-1 S% of yeast genes were essential on the basis that "a strain deleted for such a 
gene cannot grow on YPD at SC'C". As well as these functional studies, genomics (particularly 
comparative genomics) is often used to identify essential genes [9], and has been applied to E.coli, 
yeasts, Mycobacterium tuberculosis [10], etc, A further approach to identifying essential genes is 
given in reference 11. The DEG "database of essential genes" [12,13] is a further source. The skilled 
10 person is thus readily able to identif>' various genes whose absence cannot be tolerated by a host. 

The essential gene is preferably short e.g. with a coding sequence (start codon to stop codon 
inclusive) of <3000 base pairs (e g, <2500 bp, <2000 bp, <1500 bp, <1250 bp, <1000 bp, or shorter). 
The use of short genes is prefeired because it reduces the potential for duplication of restriction sites 
within a vector. If restriction sites are dupKcated, however, then codons can be changed to remove 
15 the recognition sequence without changing the encoded amino acid(s) or, as an alternative, the vector 
may be equipped for ligase independent cloning (LIC) as described below. 

One advantage of the invention is that high cop3' numbers of the heterologous gene can be obtained, 
and this is accompanied by hyper-expression of the essential gene. Thus the essential gene is 
preferably not lethal when hyper-expressed. To achieve maximum copy number, it is preferred that 
20 the essential gene should be required by the host at high levels. 

Preferred essential genes include those which encode polypeptides with (a) a molecular weight of 
less than about 40IcDa (e.g. <30kDa, <20kDa, or <10lcDa), and/or (b) reasonable cellular abundance 
as indicated by their codon adaptation indices (CAI [14]) of more than about 0.3. Genes which 
satisfy these criteria in yeast include: CDC33, COFl, EFBl, ERG25, FBAl, GFUl, GSPI, GUKl, 
25 HEM13, HSPIO, JPPl, NHP2, NOP 2, NOP 10, NTF2, PFYh PSAl. RLP24, RPBIO. RPCW, RPL5. 
RPLIO, RPL15A, RPL17A. RPLI8A, RPL25, RPL28. RPL30, RPL32, RPL33A, RPL43A. RPPO, 
RPS2, RPS3, RPS5, RPSJ3, RPSJ5, RPS20, RPS31, SARI, SEC14, ShdT3, SNU13, SSSl SUI2, 
TIFJJ, TPIl VRG4, and YRBL 

Preferred essential genes include those involved in cell cycle control and/or involved in mitosis. 

30 A preferred essential gene for use with the invention is MOBL whose expression is absolutely 
required for completion of mitosis and maintenance of ploidy in yeast [15]. The yeast gene is less 
than 750 bp in length, and hyper-expression of the encoded Mobl protein is tolerated. 

Another preferred essential gene for use with the invention is Cdc33 (also known as e[F4E). which 
recognises the 7-methylguanosine-containing cap of mRNA in the first step of mRNA recruitment 
35 for translation. The Cdc33 protein has 212 aa in yeast and is abundant as judged by direct assays and 
by its CAI index of 0.387. Furthermore, as CDC3.3 is a translation factor then increased expression 
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levels caused by copy number amplification may have a beneficial effect on heterologous protein 
expression. Over-expression of CDC33 can cause slow growth but this effect can be overcome in a 
/SclnJ or Acln2 background [16] and should not matter anyway over a typical 4-8 hr induction period. 

Another preferred essential gene for use with the invention is Cdc28, which is a protein of 298 aa in 
5 yeast, li is a serine/threonine protein kijiase which is essential for the completion of the start, the 
controlling event, in the cell cycle. More than 200 substrates have been identified. 

Another preferred essential gene for use with the invention is HsplO^ which is a lOkDa mitochondria! 
chaperonin in yeast (homologue of E.coli GroES) that regulates the Hsp60 chaperonin [17]. HsplO is 
involved in protein folding and sorting in mitochondria. 

10 Other essential genes for use with the invention can be identified empirically e.g. by the use of 
chromosomal knockout techniques to identify lethal knockout mutations, combined with a test for 
w^hether the lethal effect can be reversed by supplying a copy of the knocked-out gene on a plasm id. 

In cells of the invention, the essential gene is expressed from an extra-cliromosomal element rather 
than from a chromosomal site. Loss of the extra-chromosomal gene results in death of the cell. 

1 5 The use of an essential gene makes the system inherently stable and so is preferable to the use of a 
resistance gene for several reasons. For instance: the need for minimal selective media is avoided, 
thus giving higher growth rates: there is no risk of the final product being contaminated by the 
resistance molecule e.g. antibiotic contamination; and, for cells such as yeasts, the need for expensive 
anti-microbials is avoided. 

20 As the invention utilises genes that are essential, the absence of that gene from a host's 
chromosome(s) means that a functional copy of the gene has been lost from the chromosome, to be 
replaced by the extra-chromosomal gene. It will be understood that the replacement gene need not be 
precisely the same as the gene which has been lost. Tolerable differences include point mutations that 
change the gene's sequence without changing the encoded amino acid sequence, point mutations that 

25 change tlie encoded amino acid sequence without functional consequence, the addition of fusion 
sequences {e.g. a GST fusion of MOBl can be used to replace native MOB I), and the use of a gene 
that is different from the lost chromosomal copy (e.g. from a different species, or even a different 
t>'pe of organism) but which is functionally able to complement that loss. Taking S.cerevisioe as an 
example, therefore, the host could lack an essential gene which is complemented by the 

30 corresponding gene from S.pombe or from any other eukaryote. The use of a non-identical gene 
which is less efficient than the native chromosomal gene can further enhance copy number 
amplification, as described below. However, the use of extra-chromosomal genes which are the same 
as those found wild-type in the host organism's chromosome is not excluded. 
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Preparing the cell 

Cells of the invention have lost an essential gene on their chromosome(s), but complement that loss 
using an extra-chromosomal copy of the gene. As loss of an essential gene cannot be tolerated, it is 
not feasible to make cells of the invention simply by deleting the chromosomal copy and then 
5 transforming the mutant cells with a vector encoding the gene, because death means that there is no 
way of selecting for cells which lack the essential gene. Instead, cells of the invention can be 
prepared by means of "plasmid shuffling" [18], involving a transitional stage where cells possess the 
essential gene in two separate extra-chromosomal forms {e.g. see Figure 6). 

The overall shuffling process begins with a mutant cell that lacks a chromosomal copy of an essential 
10 gene, but which possesses a replacement copy on a first vector, which vector also contains a 
conditionally-lethal marker. A second vector of the invention (carrying (a) a further replacement 
essential gene, (b) a conditionally-essential marker, and (c) a heterologous gene) is then used, and 
transformants are selected on the basis of the vector's conditionally-selective marker. At this stage 
the cell contains two extra-chromosomal copies of the essential gene, one on a first vector which 
15 contains a negative selection marker and one on a second vector which contains a positive selection 
marker and a heterologous gene. Loss of either vector leads to retention of the essential gene, but 
only the second vector is useful for heterologous protein expression. Thus the process then proceeds 
to eliminate cells which retain the first vector, thereby selecting cells which possess only the second 
vector. This final selection uses the first vector's conditionally-lethal marker, to yield cells in which 
20 the essential gene and the heterologous gene are encoded by the same vector. The overall effect of 
this process, therefore, is to replace the first vector with the second vector. Cells which lose both 
vectors lose the essential gene and thus die. 

The invention can be performed much more quickly than existing eukaryotic expression systems, 
such as Pichia and baculovirus, and essentially as quickly as with advanced bacterial expression 
25 systems. Once the desired DNA fragment is cloned into the plasmid of the invention, a yeast host 
expressing high levels of the protein can be prepared in less than two weeks. 

Overall, the shuffling process involves: (a) a host cell with an inactive chromosomal essential gene, 
complemented by a 'covering' plasmid which supplies the essential gene and contains a 
counterselection marker; and (b) an expression plasmid which also supplies an essential gene and 
30 contains the heterologous gene of interest (usually under the control of a repressible promoter) plus a 
selection marker. The shuffling protocol swaps the two plasmids without going via a stage where the 
extra-chromosomal essential gene is lost. 

In S.cerevisiae a covering plasmid will generally include the URA3 counterselection marker, the 
expression plasmid will include a selection marker {e.g. auxotrophic marker), and the expression of 
35 the heterologous product will be controlled by galactose repression of GALI-IO. The URA3 marker 
advantageously allows selection of starting cells which contain the covering plasmid and also, using 
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FOA, allows counterse lection of intermediate cells. Similar considerations apply in S.pombe, 
although the heterologous product may be controlled by thiamine repression of the nmtl promoter. 

In E.coli and other applicable bacteria a covering plasmid may include the sacB gene from B.subtilis. 
This gene prevents growi;h on sucrose, permitting counterselection. Unlike URA3 the sacB gene does 
5 not also allow a positive selection and so the covering plasmid will also include a marker such as 
kan'^ for selecting suitable starting cells. 

As an altemative to the socB system, the rpsL system can be used. Cells carr>'ing the wild type rpsL 
(Str^^"^) are sensitive to streptomycin, but many rpsL mutations give streptomycin resistance (Str*^^). 
If a cell has both Str^^^^ and Str^^ genes, however, they remain sensitive to streptomycin. A covering 
10 plasmid can thus contain wild-type rpsL and kan^. Using a Sti-^'^ starting cell and an expression 
plasmid with amp^ the intermediate cells can be selected based on ampicillin resistance. Loss of the 
covering plasmid can then be selected based on streptomycin resistance. 

The combined use of the sacB and sti'A systems in E.coli is described in reference 19, 

The invention uses a starting cell which expresses chromosomal genes and extra-chromosomal 
15 genes, wherein (a) the expressed extra-chromosomal genes include an essential gene whose 
expression is unconditionally required for sui-vival of the cell, (b) the expressed chromosomal genes 
do not include said essential gene, and (c) the extra-chromosomal genes include a 
conditionally- lethal gene. Suitable starting cells have been described in the art for various essential 
genes [e.g. 20,21]. The invention provides a starting cell, characterised in that (i) the cell is a 
20 S,cerevisiae yeast, and (ii) the essential gene is MOBl, Cdc33 or HsplO. 

As an altemative to using a plasmid shuffling approach, it is possible to prepare cells of the invention 
from diploid cells that are hetero-allelic for an essential gene i.e. cells that contain a diploid genome 
but which express a functional form of the essential gene from only one haploid set of chromosomes. 
The hetero-allelic cell is transformed with a plasmid encoding both the essential gene and the 
25 heterologous gene of interest and, after sporulation, haploids lacking a functional chromosomal gene 
are selected [22]. This technique is more complicated than plasmid shuffling, but may be preferred if 
there is frequent recombination between chromosomes and shuffling plasmids. 

Extra-chromosomal genes and vectors 

Cells of the invention include extra-chromosomal genes, which are located on an extra-chromosomal 
30 vector. Such vectors do not include DNA of the mitochondria, chloroplasts or kinetoplasts (where 
applicable). Preferred vectors are capable of autonomous replication Le. their copy number can 
exceed the copy number of the host cell's own chromosome(s). Preferred vectors are non- integrating 
(unlike the situation with prior art Pichia systems). The extra-chromosomal genes will generally be 
found on a plasmid or in a viral vector. 
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Plasmids of the invention include an essential gene, such that (a) the plasmid can complement the 
lack of that gene in a host's chromosome, and (b) loss of the plasmid is lethal to the cell. 

Plasmids of the invention also include a heterologous gene. 

Plasmids of the invention v^ill usually also include a conditionally-required gene. This gene is not 
5 required for surs'ival of a cell of the invention, but may be used during the cell's preparation (see 
below). Conditionally-required genes allow transformants to be selected under appropriate selective 
growth conditions, and may confer resistance to an otherwise-toxic substance (e.g. an antibiotic 
resistance gene^ such as ampR, kanR, tetR, hyg, etc.; a drug resistance gene, such as aad, ble, dhfr, 
hpt, nptll, aphll, gat, poc, neoR, etc.\ a herbicide resistance gene, such as bar, pat, csri-i, shpd, 
10 epsp, etc.; and other resistance genes, such as ble, bsd, gpt, hisD, irpB, hprt, tk) or treatment (e.g. 
irradiation, mutagenesis), or may complement an auxotrophic mutation in the host's chromosome 
(e.g. the URA3, LEU2, TRPJ, HISS, LYS2, ADE2, ADE3 genes; etc). A preferred conditionally- 
required gene is TRPI^ which can be used to select yeast transformants on the basis of growth in a 
Trp-free medium. 

15 Other plasmids used in preparing host cells of the invention (e.g. plasmids used to prepare starting 
cells, and retained in intermediate cells of the invention) include the same essential gene as described 
above, but include a conditionally-lethal gene for counterselection. Cells containing these plasmids 
can thus be selectively killed. Typical conditionally-lethal genes encode proteins which convert 
non-toxic substances into toxic substances, and examples include, but are not limited to: URA3 

20 (lethal in the presence of 5-fluororotic acid, FOA); LYS2 (lethal in the presence of a-aminoadipic 
acid as the primary nitrogen source); CAN! (lethal in the presence of canavanine and absence of 
arginine); CYH2 (lethal in the presence of cycloheximide); Tk or thymidine kinase (lethal in the 
presence of ganciclovir or acyclovir); Cd or cytosine deaminase (lethal in the presence of 
5-fluorocytosine); Ntr or nitroreductase (lethal in the presence of CB1954); sacB from B.subtilis 

25 (lethal in the presence of sucrose); rpsL and mutant rpsL (selection based on streptomycin sensitivity/ 
resistance); etc. 

Some conditionally-required genes (for "positive selection") can also be used as conditionally-lethal 
genes (for "negative selection"), depending on growth conditions. For example, URA3 is a 
conditionally-required gene for uracil auxotrophs, but it is lethal when growth occurs in the presence 

30 of FOA. Similarly, thymidine kinase offers a salvage pathway in the presence of aminopterin, but is 
lethal in the presence of acyclovir, A further example, daol encoding D-amino acid oxidase (DAAO) 
has been described in plants [23], where selection is based on the differing toxicity of D-amino acids 
and their metabolites in plants, as D-alanine and D-serine are toxic to plants, but can be metabolised 
by DAAO to non-toxic products, while D-isoleucine and D- valine have low toxicity but are 

35 metabolised by DAAO into toxic keto acids. Where a process of the invention uses both a 
conditionall3'-required gene and a conditionally-lethal gene, however, different genes will usually be 
used. 
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As well as (a) the essential gene, (b) the conditionally-required gene, and (c) the optional 
heterologous gene, plasmids of the invention will typically include one or more of the following 
elements: (i) an origin of replication functional in a host cell of interest {e.g. functional in yeast, such 
as an arsi element or, more preferably, a 2\i ori element); (ii) a polylinker or multi-cloning site, 
5 containing a plurality (e.g. 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) of restriction sites in the same or, 
preferably, in ditTerent reading frames e.g. see Figure 4; (iii) a transcription termination sequence 
{e.g, T-ADHl, T-CYCJ, etc.) and/or additional stop codons (TGA, TAA and/or TAG) downstream of 
one or more (preferably all) of the promoters and their coding sequences in the plasmid; and (iv) a 
stabilising sequence, such as stb. Transcription termination sequences can be included as part of a 
10 heterologous insertion rather than as part of a starting vector. 

To function as a shuttle vector between eukaryotes and bacteria, thereby simplifying preparative 
work, the plasmid may also include one or more of: (v) an origin of replication fiinctional in bacteria, 
such as the CoIEJ origin of replication: and (vi) an antibiotic resistance marker suitable for selection 
of bacterial transformants. As an alternative to using bacteria for preparative work, gap repair cloning 
1 5 [24] can be used. 

Where a vector is for bacterial expression and is used in a shuffling procedure, an intermediate cell of 
the invention will include both a covering plasmid and an expression plasmid. The origins of 
replication in these plasmids should be of different compatibility gioups to ensure that they can 
occupy the same cell during shuffling (e.g. one Co/£7-based plasmid and one PI 5A-based plasmid). 

20 Heterologous genes 

Plasmids used in cells of the invention, and in intermediate cells, include a heterologous gene i.e. a 
gene not naturally expressed in the organism in which the plasmid is propagated. Transcription of the 
heterologous gene will generally be under the control of a promoter that is ftinctional in the host cell, 
as expression of the gene cannot be achieved using a promoter diat is inactive in the cell. 

25 The heterologous gene preferably comprises a coding sequence from a eukar>'ote, more preferably 
from a higher eukaryote. For example, the heterologous gene may comprise an animal sequence e.g. 
from a mammal, such as a human sequence. As an alternative, the heterologous gene may comprise a 
coding sequence from a virus (preferably a eukaryotic virus), a parasite, a pathogenic bacterium, etc. 

Various types of heterologous genes can be used: (a) one type of heterologous gene is a sequence 
30 which encodes a polypeptide that is useful during protein purification, and to which a further 
sequence of interest may be fused to give fusion polypeptides; (b) a second type of heterologous gene 
is a sequence which encodes a fusion polypeptide, comprising a sequence useful during protein 
purification, fused to a further sequence of interest; (c) a third type of heterologous gene is a 
sequence of interest without any fiision sequence. Fusion expression (b) of a protein of interest is 
35 typical, but direct expression (c) is also usefijl. A gene sequence useful during protein expression (a) 
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will not typically be expressed as a protein for its own sake but will be used as a starting materia! for 
preparing a fusion construct (b). 

Polypeptides commonly used as ftision partners to assist in purification include, but are not limited 
to: glutathione-S-transferase (GST), purified using immobilised glutathione [25]; poly-histidine tags, 
5 purified by IMAC [26]; calmodulin-binding peptide (CBP), purified using immobilised calmodulin; 
maltose-binding protein (MBP), purified using immobilised amylose; a chitin-binding domain 
(CBD), purified by binding to chitin; secretory signals: and the Flag epitope (DYKDDDDK) (SEQ ID 
NO: 1) [27], haemagglutinin epitope (YPYDVPDYA, HA-tag) (SEQ ID NO: 2), VSV-G epitope, 
thioredoxin or c-myc epitope (EQKLISEEDL) (SEQ ID NO: 3), purified by specific immunoaffinity 
10 chromatography. Thus a plasmid of the invention may include a sequence that encodes one of these 
polypeptides, optionally fused to a further sequence of interest. These two elements may be arranged 
in either order, N-terminus to C-terminus, but it is typical referred to have the further sequence 
downstream of (J.e. fused to the C-terminus of) the purification sequence. 

The ability to express proteins as GST-fusions is an advantage over Pichia systems, as GST-fusions 
15 in Pichia typically fail to bind to immobilised glutathione. The ability to use poly-histidine tags is 
also an advantage over Pichia, where alcohol dehydrogenase protein co-purifies on IMAC columns. 
The invention avoids these difficulties* 

Where the heterologous sequence is designed for fusing to further sequences, or where it is fused to a 
further sequence, it is typical to include a protease recognition sequence at the junction between the 

20 two (J.e, at or near the 3' or 5' end of the heterologous sequence). A protease can then be used to 
generate the protein of interest without its purification tag. The proteolytic cleavage can take place 
after purification of the fusion protein or, to simplify purification, can take place while the fusion 
protein is immobilised on an affinity column, allowing the cleaved protein of interest to elute while 
the purification tag remains immobilised. Protease recognition sites include, but are not limited to: 

25 VPR/GS (SEQ ID NO: 4) (Thrombin); lEGR (SEQ ID NO: 5) (Factor Xa Protease); DDDDK 
(SEQ ID NO: 6) (Enterokinase); ENLYFQ/G (SEQ ID NO: 7) (endopeptidase rTEV from tobacco 
etch virus); and LEVLFQ/GP (SEQ ID NO: 8) (human rhinovirus protease 3C), As an alternative to 
using a protease recognition sequence, a self-cleaving protein can be constructed based on inteins 
[28,29]. 

30 Prior to use with the invention, the heterologous gene will be prepared in a form suitable for insertion 
into a vector of the invention. This may be by digestion of nucleic acid containing the gene, using 
enzymes that are compatible with the insertion site in the vector of the invention, or by inclusion of 
addition of suitable sequences during preparation e.g. by PCR amplification. 

The insert may be suitable for Hgase independent cloning ('LIC [30-32]). For example, the 5' and 3' 
35 regions of the insert may have long (e.g. >I5 nucleotides) high level of sequence identity to the ends 
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of the linearised vector (usually long sticky ends), thereby facilitating insertion of the sequence into 
the vector without needing ligase (or phosphatase). 

The insert sequence may be directly from a natural gene, or may have been modified in some way 
e.g. to remove introns, to change codon usage, to introduce or remove restriction sites, etc. 

The invention has been found to be particularly suitable for expression of proteins which have been 
difficult to express in existing systems. Ltel (low temperature essential) [33] is a large yeast protein 
(>1400 amino acids) which cannot be expressed in E.coli, but using the invention is has been 
successfully expressed in soluble form as a GST-flision (in both directions, N-terminus to 
C-terminus). Tlius the heterologous gene may encode a protein with 300 or more amino acids {e.g. 
350, 400, 450, 500, 600, 700, 800, 900, 1000 or more), although expression of proteins shorter than 
300 amino acids (e,g. 200 or fewer amino acids) is not excluded. Yeast proteins Bfal and Bub2 are 
found naturally at low levels and were subject to considerable degradation in E.coli expression 
systems [34], but have now been expressed at high levels in soluble fonn as GST-fusions. Expression 
of yeast kinases CDC5, CDC 15 and CDC28 in E.coli gives inactive proteins, but these three proteins 
have been expressed in active soluble form as GST-fusions in yeasts having chromosomal deletions 
of the proteins. Mammalian proteins such as Tpl2 have also been successfully expressed as 
GST-fusions. Some of these proteins have subsequently been prepared in pure form after thrombin 
cleavage to remove the GST moiety. Likewise, soluble SARS virus Nspl3 gene product, a putative 
mRNA Cap I methyl transferase, has been expressed and cleaved from the GST affinity purification 
tag using human rhinovirus protease 3C. 

Tlius the heterologous gene is preferably expressed as a soluble protein, even in fusion form. The 
production of soluble proteins is an advantage when compared to bacterial expression systems. 

Following expression according to the invention, proteins may adopt their native dimeric form in 
solution. Thus the heterologous gene may encode a protein which naturally forms an oligomer, such 
as a dimer. trimer, tetramer, pentamer, hexamer, etc. 

For hetero-oligomeric proteins, it is possible to express multiple heterologous genes from the same 
plasmid, but it is preferred to use one plasmid per heterologous gene, in which case the invention 
generally uses one essential gene per monomer i.e, the chromosome of a host for expressing a 
hetero-dimer will have two inactive essential genes, with their functions being complemented by 
different plasmids. Stoichiometric expression can be achieved if the same promoter is used for each 
monomer, provided that the plasmids' copy numbers are the same. 

The heterologous gene is generally different from the essential gene. 
Control of gene expression 

Plasmids for use with the invention include (a) an essential gene, and (b) a conditionally-required 
gene and/or a conditionally-lethal gene. For expression purposes, plasmids of the invention also 
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include a heterologous gene. Expression of these genes is controlled by upstream promoters. Various 
promoters may be used, but the invention offers better expression if particular promoters are used. 

The essential gene is preferably under the control of a repressible promoter. To increase expression 
levels, the invention exploits the background level of "leaky" expression driven by such promoters 
5 even vyhen they are turned "ofF' e.g. by catabolite repression. As the essential gene is required for the 
host cell to survive, but the host cell does not have a copy of the essential gene on its own 
chromosome, there is a selective pressure to increase the plasmid's copy number. As the copy 
number increases, the overall expression of the essentia! gene increases such that the combined 
background expression is adequate for survival. 

10 By repressing expression of the essential gene, therefore, the invention can achieve a high copy 
number of the plasmid. An increase in copy number also gives increased levels of the heterologous 
gene, thereby improving expression levels of the protein of interest. The process of the invention 
may thus include a step of increasing the copy number of a vector to at least 5 {e.g. to at least 10, 20, 
30, 40, 50 or more). Tlie use of "leaky" low level expression to increase copy number is known [35]. 

15 Copy number amplification can be further enhanced by using codons in the essential gene which are 
non-optimal for the host in question. \\Tiere further enhancement of this type is not required, 
however, the essential gene may be modified for optimum codon usage. 

The heterologous gene is preferably under the control of a promoter that is both repressible and 
inducible. Rather than being used to increase copy number, however, this promoter is used to allow 
20 controlled expression of the protein of interest. Wlien there is an increase in copy number of the 
plasmid, high levels of heterologous protein expression are achieved. It is thus useful to avoid 
expression of the heterologous gene until a desired time to avoid possible toxic effects of 
over-expression. For example, if Bfal or Clb6 is over-expressed then cells die. Thus the heterologous 
gene may encode a protein that is potentially toxic to the host during nonnal growth. 

25 A typical repressible promoter system for use with the invention is based on the GALl-10 promoters 
of Gail galactokinase I and GallO UDP-glucose 4 epimerase. These are tightly repressed by glucose 
but highly activated when galactose is the sole carbon source. In S.cerevisiase, the dual GALl and 
GALIO promoters are juxtaposed in nature (within the Pgau element) and are transcribed in opposite 
directions, and this arrangement of promoters conveniently allows divergent repression of the 

30 essential gene (controlled by one of the pair, in one direction) and the heterologous gene (controlled 
by the other member of the pair, in the other direction) [36]. 

Other repressible promoters include, but are not limited to: the repressible acid phosphatase gene 
promoter (PH05), which is activated at low inorganic phosphate levels [37,38]; the thiamine- 
repressible promoter (from nmtl), which is repressed by thiamine [39,40]; the metallothionein 
35 promoter (from MTTI), which is induced by Cd^^ [41]; the copper transport protein promoter (from 
CTR3\ which is repressed in the presence of copper ions [42]; a light-switchable system involving a 
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DNA-binding domain fused to phytochrome, a transcription activation domain fused to PIF3, grown 
in a medium containing phycocyanobilin, with red light being an activator and far-red light being a 
repressor [43]. In bacteria the IPTG-inducible lac promoter can be used. 

The heterologous gene and the essential gene may be controlled by separate copies of the same 
5 promoter. Expression of the two genes is thus controlled together, ahhough over-expression of the 
heterologous gene is not generally required for the invention to function. 

To express heterologous proteins according to the invention, a promoter will be activated {e.g. by 
addition of an inducer^ or by removal of a repressor). While the expressed extra-chromosomal genes 
in a cell of the invention must include the essential gene, therefore, the heterologous gene may be 
10 expressed or non-expressed depending on prevailing circumstances. 

Yeast engages its ubiquitination system to tag many proteins for degradation at the exit from Gl and 
in the later stages of M phase. This tagging can interfere with the yield of some heterologous proteins 
in yeast, but can be prevented by airesting cells in early Gl or M phase. Cell cycle arrest can be 
achieved in various ways, including the use of a factor or of cell cycle inhibitors such as nocadazole. 
1 5 Expression methods of the invention may thus involve the use of such reagents. 

During expression of the heterologous gene, a yeast may be in diploid or haptoid form. 
Host cells 

Because all organisms have essential genes, and the invention is based on the fundamental principle 
of moving an essential gene from the chromosome onto an extra-chromosomal element so that 

20 trans form ants can be selected, the invention is applicable to all organisms, including prokaryotes and 
eukar>'otes. In particular, the availability of plasmid shuffling protocols for many organisms 
facilitates the widespread use of the invention. Because bacterial expression systems are already 
well-developed, however, the invention's benefits are most immediately useful in eukaryotes, 
including unicellular eukar>'otes (such as yeasts) and multicellular eukaryotes (such as animals and 

25 plants). As the use of essential genes as markers avoids the need for antibiotics, however, the 
invention offers advantages over conventional systems in situations where even traces of antibiotics 
in the purified expression product cannot be tolerated. 

The invention is particularly useful for yeasts. Yeast is an inexpensive organism to work with, can be 
stored easily by freezing, and has an extensive historical background in expression and genetic 
30 manipulation, and with the sequencing of the S.cerevisiae genome, genomics and proteomics of this 
organism have been heavily exploited. Many suitable clones and vectors for expression and selection 
are readily available, and these have been extensively studied and characterised. Furthemiore, studies 
of the yeast proteome have shown that yeasts are extremely tolerant to the expression of genes in the 
form of fusion proteins, without loss of solubility or function [44,45]. 
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Preferred yeasts are those which support plasmids and, for assisting in the preparation of cells of the 
invention, which exist in haploid and diploid forms. Budding yeasts are particularly preferred. 

Yeasts include the following genera: Arthroascus, ArxioTyma, BuUera, CandidCy Debaryomyces, 
Dekkera, Dipodascopsis, Endomyces, Eremothecium, Geotrichum, Hanseniaspora, Hansenula, 
5 Hormoascus, Issatchenk'ia, Kloeckera, Kiuyveromyces, Lipomyces, Lodderomyces, Metschnikowia, 
Pachysolen, Pachytichospora, Pichia, Rhodosporidium, Rhodoforula, Saccharomyces, 
Saccharomycodes, Schlzohlastosporion, Schizosaccharomyces, Schwaniomyces, Sporobolomyces, 
Steriginatomyces, Sympodiomyces, Taphrina, Torula, Torulaspora, Toridopsis, Trichosporon, 
YoiTowia, Zygohansenula, and Zygosaccharomyces, Preferred genera for use with the invention are 
10 Saccharo}?iyces, Schizosaccharomyces and Pichia, Common industrial yeast systems include 
Hansemda polymofpha, KJityveromyces lactis, Yarrowia lipolytica, Saccharomyces carlsbergensis, 
Saccharomyces ellipsoideus and Candida utilis, and particularly prefened species for use with the 
invention are Saccharomyces cerevisiae (budding or bakers yeast) and Schizosaccharomyces ponibe 
(fission yeast [46]). Such yeasts are readily available to the skilled person, 

15 Many E.coli strains optimised for recombinant protein expression are available e.g. BL21 and its 
derivatives. 

The invention does not utilise wild-type cells as hosts, as the invention relies on the absence of an 
essential gene from the host's chromosome, with that absence being complemented by an 
extra-chromosomal copy of the gene. Thus the host's chromosome will be lacking a functional copy 

20 of an essential gene. Typically, therefore, the invention will use a host that has a knockout genotype 
for the essential gene in question. The knockout may remove or disrupt the wiiole or part of the 
chromosomal gene, in the regulatory region(s) and/or the coding region(s). Thus remnants of the 
essential gene may remain in the chromosome, but the overall effect will be that the host's 
chromosome cannot be transcribed and/or translated to produce the essential gene product in 

25 functional form. Knockout of essential genes is known in the prior art [e.g. 20,21] but 
complementation with extra-chromosomal copies of the genes has been used to study the essential 
gene itself rather than as a way of selecting for the presence of a different heterologous gene. 

Knockout by homologous recombination is a preferred method for obtaining suitable host cells, and 
in particular knockout by isogenic deletion. Replacement of a chromosomal gene with a marker gene 

30 is typical e.g, as a result of homologous recombination to insert an antibiotic resistance gene. Gene 
inactivation methods such as those disclosed in references 47 and 48 can easily be adapted by the 
inclusion of covering plasmids encoding an essential gene prior to the inactivation step. Other 
non-knockout methods of preventing expression of an essential protein include chromatin silencing, 
antisense and RNA silencing {e.g. RNAi) techniques, although such techniques are not preferred due 

35 to their reversible nature and to the difficulty in ensuring that vector-derived genes are not also 
inactivated. A further way of eliminating the chromosomal gene's function is by mutagenesis of 
codons encoding critical amino acids e.g. a single Arg-522-His mutation in the sigA gene encoding 

-14- 



wo 2005/078105 



PCT/GB2005/000372 



o'^ in Mycobacterium smegniatis is lethal, without the need for knockout of the whole coding 
sequence [49]. Tlius the skilled person can readily generate a host cell in which a chosen essential 
gene has been disabled, either by preventing its expression (either at a transcriptional or translational 
level) or by allowing its expression but in an inactive form. 

5 In addition to knockout of the essential gene, the host may include further mutations to remove 
undesirable phenotypes. These mutations may already be present in a starting yeast strain, or they 
may be introduced. 

For example, many host cells express endogenous proteases which degrade heterologous proteins, 
but which are not essential to viability under laboratory conditions. Deletion of such proteases from 
10 the host improves recombinant protein expression. Thus a cell of the invention may include knockout 
mutations of one or more endogenous proteases. In yeast, deletion of PEP4 function (the 
saccharopepsin aspartyl protease [50]) is a preferred mutation. Other proteases which can be knocked 
out include Prbl, Prcl and Cpsl. 

The host cell may have mutations in genes responsible to cell wall assembly, such that the cell wall is 
15 weakened in order to simplify post-expression processing of cells. Such mutations make cells more 
fragile, which may not be useful in a general laboratoiy bench setting, but would be ver>' useful in a 
specific expression system at an industrial scale where simplification of downstream processing is a 
higher priority than benchtop resilience. 

The host cell may have mutations to prevent slow growth e.g. deletion of cln3 or clu2 in yeast. A 
20 preferred strain is one which is able to produce a higher biomass than wild-type yeast under the same 
conditions. A mutant strain has been described which contains only a single hexose transporter, a 
hybrid of Hxtl and Hxt7 [51]. This mutation restricts glucose influx and avoids overflow into lactate. 
This results in slow steady respiration of the glucose and a higher resultant biomass. 

The host cell may also include heterologous genes encoding foreign proteins, such as those from 
25 non-native metabolic pathways. For example, heterologous glycosyltransferases and other 
glycosylation enzymes {e.g. mannosidases I and II, N-acetylglucosaminyl transferases I and II, 
uridine 5'-diphosphate (UDP)-N-acetylglucosamine transporter, etc.) may be expressed in order to 
increase the glycosylation repertoire of an expression host [52], and in particular to mimic human 
glycosylation. Native pathways may be inhibited or knocked out to assist in this approach [53]. 

30 Multiple Genes 

The invention has been described above in terms of using a single essential gene as a marker. The 
invention can also be used with multiple essential genes as markers. Each gene with an essential 
fijnction is (a) expressed extra-chromosomal ly, the expression of those genes being required for 
viability of the cell, wherein (b) the expressed chromosomal genes do not provide those essential 
35 fijnctions. For example, preferred essential genes may include both MOBl and CDC28. Therefore, 
the chromosomal genes may have both MOBl and CDC28 knocked out, and the functions provided 
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by these genes are instead provided by extra-chromosomal genes. In a further example, it is possible 
for more than two essential genes to be used as markers (e.g. the chromosomal genes may have the 
MOB J, CDC28 and HspIO genes knocked out). As mentioned above, a number of essential genes 
have been described and it is possible to knock out any number of these genes on the chromosome of 
5 the host cell. For each loss of an essential function firom the chromosomal genes, that function must 
be replaced by proteins expressed from the extra-chromosomal genes, otherwise the cell cannot 
survive. 

The extra-chromosomal genes that provide the essential function may be found on the same plasmid 
as each other, or on separate plasmids. Therefore if the expressed chromosomal genes lack three 
10 essential functions, then the extra-chromosomal genes may provide these essential functions using 
one, two or three different plasmids. Tlierefore a single plasmid may comprise one or more (e.g. 1, 2, 
3, 4, 5, 6, 7, 8, 9, 10 or more) genes with essential functions. 

If the chromosomal genes have n essential genes knocked out, then there must be n 
extra-chromosomal essential genes. Each cell may comprise from \ io n differerent plasmids, which 
15 together provide tlie function of the n different essential genes. Each of the plasmids is required by 
the cell for survival. If there are fewer than n plasmids, then at least one plasmid will comprise more 
than one essential gene. Loss of any of the essential extra-chromosomal genes is lethal to the cell. 

The invention may also be used to express more than one heterologous protein, and the invention is 
then particularly useful for the co-expression of proteins that can interact to form complexes e.g. 
20 heterodimers. Each plasmid encoding an essential gene may also encode one or more (e.g. 2, 3, 4, 5, 
6, 7, 8, 9, 10 or more) heterologous gene of interest. 

The cell may express up to x heterologous proteins, x can be the same as n, less than n or greater than 
rt, depending on whether the essential gene and/or heterologous protein is duplicated. 

Preferably, for « knocked out essential genes and n heterologous genes, the cell comprises n 
25 plasmids, each comprising one extra-chromosomal essential gene and one heterologous gene. 

Therefore, the cell of the invention may comprise at least one further extra-chromosomal gene with 
an essential function that the chromosomal genes do not provide. The further extia-chromosonial 
genes may also comprise at least one further heterologous gene, the expression of which is controlled 
by a promoter that is functional in the cell. In such a case, loss of any of the extra-chromosomal 
30 essential genes is lethal to the cell. 

Where more than one essential function marker is used, each is replaced by carrying out the plasmid 
shuffling steps described above, once for each particular plasmid encoding an essential gene. Each 
covering plasmid and each expression plasmid should contain a different conditionally lethal 
selection marker such that their loss can be selected individually. 
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For example, a cell may be a MOB} and a CDC28 kxiock out. Such a cell may contain two covering 
plasmids; one which expresses MOB I, the other expressing CDC28. In a first plasmid shuffling step 
the MOB 1 -encoding covering plasmid is replaced by a MOB 1 -encoding expression plasmid that also 
expresses at least one heterologous protein, and in a second plasmid shuffling step the CDC28 
5 encoding covering plasmid is replaced by a CDC28 encoding expression plasmid that expresses at 
least one (different) heterologous protein. 

Alternatively, the cell may contain a single covering plasmid which expresses both MOB I and 
GDC28. Plasmid shuffling is then used to replace the single covering plasmid with the two 
expression plasmids, each of which expresses one or more (e.g. I, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) 
10 heterologous genes. Cells are selected which contain the two expression plasmids. 

It is also possible to replace a single covering plasmid which covers two knocked out essential genes 
with a single expression plasmid that comprises both essential genes and expresses one or more (e.g. 
I, 2, 3, 4, 5, 63 7, 8, 9, 10 or more) heterologous genes. It is also possible to replace two covering 
plasmids that comprise different essential genes with a single expression plasmid that covers both 
15 essential genes and expresses one or more (e.g. K 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) heterologous 
genes. 

It is also possible to carry out a similar process where more than two (e.g. 3, 4, 5, 6, 7, 8, 9, 10 or 
more) essential genes, more than two {e.g. 3, 4, 5, 6, 7, 8, 9, 10 or more) heterologous genes, more 
than two {e.g. 3, 4, 5, 6, 7, 8, 9, 10 or more) covering plasmids and/or more than two {e.g. 3, 4, 5, 6, 
20 7, 8, 9, 10 or more) expression plasmids are used. 

General 

The term "comprising" means "including" as well as "consisting" e.g, a composition "comprising" X 
may consist exclusively of X or may include something additional e.g. X + Y. 

The term "about" in relation to a numerical value x means, for example, x±\QVo. 

25 The word "substantially" does not exclude "completely" e.g. a composition which is "substantially 
free" from Y may be completely free from Y, Where necessary, the word "substantially" may be 
omitted from the definition of the invention. 

Polypeptides 

The invention also provides polypeptides expressed by the methods of the invention. The 
30 polypeptides expressed by the invention may be expressed as single proteins or as complexes. For 
example, the polypeptides may be expressed as homo- or heterodimers. Preferably the polypeptides 
expressed using the invention are not expressable using conventional techniques known in the art. 
Preferred polypeptides are Ltel protein, a Bfal protein, a Bub2 protein, a CDC5 protein, a CDC14 
protein, a CDC 1 5 protein (both wild type and kinase dead), a CDC 1 6 protein, a CDC23 protein, a 
35 CDC28 protein, a Tpl2 protein, a SARS virus Nspl3 protein, a mRNA Capl methyl transferase 
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protein, CJa4 protein, Db£2 protein, APCl protein, the PP2A subunits Tpdl, Pph21, Pph22, Cdc55 
and Rtsl, a CIb6 protein, an Rgdl protein, a Ubc4 protein, a Plol protein, a HBPl protein, a PLKI 
kinase protein, a KJF2C protein, a CHO kinesin MCAK protein, a pl05 protein, a human Abin2 
protein, Mobl/Dbf2 N305A dimer, Mobl/DbG dimerand TPL2/pl05 dimer. 

BRIEF DESCRIPTION OF DRAWINGS 

Figure 1 illustrates the construction of starting strains for use with the invention, and figure 2 shows a 
further development of this process, starting with the strain produced at the end of figure L 

Figure 3 shows two maps of the pMGl plasmid, with figure 4 showing its poJylinker site (SEQ ID 
NO: 1 1 and SEQ ID NO: 12). 

Figure 5 shows expression from the pMGl plasmid using glucose (5 A) or galactose (5B). 

Figure 6 shows the plasmid shuffling used in selecting cells of the invention. The yeast cell is shown 
progressing from starting cell to intermediate cell to a ceil useful for heterologous expression of 
proteins according to the invention. 

Figures 7 to 10 show the results of protein expression according to the invention. The lanes were 
loaded with protein firom ~30ml of culture. 

Figure 1 1 shows the MOB/TRPi -based vectors (A) pMH9l9 and (B) pGSTMob/DbfZ. 

Figure 12 shows a comparison of the yields of GST-Ubc4 when expression is induced with varying 
concentrations of galactose. 

Figure 13 shows the optimum glucose concentration for expression of GST-Tpl2. 

Figure 14 shows the purification of components of the S. cerevis/ae mitotic exit network. 

Figure 15 shows (A) purification of GST-Cla4, 6His-Ltel and GST-Ltel, (B) phosphorylation of 
6His-Ltel by GST-Cla4 and (C) guanine nucleotide exchange activity of Ltel (x-axis shows time in 
minutes, y-axis shows % Teml-GDP, diamonds are Bfai+Teml, squares are Bfal+Teml+Ltel). 

Figure 16 shows (A) the elution of GST-Cdcl5, (B) the phosphorylation of MobI/Dbf2 by CdclS 
and (C) the activation of MoblA)bf2 kinase by Cdcl5. 

Figure 17 shows the purification and activities of GST-Mob 1, wild type, kinase dead and hyperactive 
Db£2. 

Figure 1 8 shows the purification of S. cerevisiae AFC components. 

Figure 19 shows the specific phosphorylation of GST-Cdcl6 and GST-Apcl by Dbf2/GST-Mobl . 
Figure 20 shows the purification of GST-Cdcl4 and phosphorylation by DbfZ/Mobl. 
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Figure 21 shows the phosphatase activity of GST-Cdcl4 (>'-axis is activity, x-axis is time) Activit>' is 
measured using absorbance at 410nm. 

Figure 22 shows the phosphatase activity of wild t>'pe and mutant GST-Cdcl4. Lane Key Irwild 
t>'pe, 2:1-462, 3:1-372, 4:316-551, 5:462-551, 6: GST only, 7: S464A S467A and 8: S494A S496A 
5 S497AS498A. 

Figure 23 shows (A) the purification of GST-Netl and (B) the inhibition of Cdcl4 activity by Netl 
(x-axis shows time in minutes, y-axis shows phosphatase activity [OD410nni], diamonds are GST- 
Cdcl4, squares are GST-Cdcl4+GST-Netl). 

Figure 24 shows the purification of the five subunits of & cerevisiae protein phosphatase 2A. 

10 Figure 25 shows the phosphatase activity of PPH2A (y-axis is activity, x-axis is time) Activity is 
measured using absorbance at 410nm. 

Figure 26 shows the purification of GST-Clb6 cyclin box fragments. 
Figure 27 shows the purification of GST-Rgdl. 

Figure 28 shows the large scale preparation of GST-Ubc4. Key: B-beads before elution, R-beads 
15 after elution. 

Figure 29 shows the phosphorylation of MBP by S.pombe GST-Plol. 

Figure 30 shows (A) the purification of mouse GST-Hbpl and (B) the purification of SARS virus 
GST-Nspl3 methyJtransferase. 

Figure 3 1 shows the purification of three GST-polo domain fragments from human polo-hke kinase. 

20 Figure 32 shows the purification of the kinesins KIF2C and MCAK. 

Figure 33 shows (A) the expression of rat GST-Tpl2 and N- and C-terminal deletion derivatives, (B) 
human 6His-pl05 and (C) human GST-Abin2 

Figure 34 shows the elution of GST-Tpl2. 

Figure 35 shows the interation of GST-Tpl2 and 6His-pl05. 

25 Figure 36 shows vector maps of (A) pMH925 and (B) pMH927. 

Figure 37 shows the coexpression and copurification of GST-Tpl2 and 6His-pl05. 

MODES FOR CARRYING OUT THE INVENTION 
Construction of stai-tlng yeast strains 

Diploid S.cerevisiae strains that are heterozygous fox MOB 1 {MOBl/mobl::kan^) are available. Such 
30 a strain was obtained and was transformed with a pl/RAS plasoiid (*'pRS3l6" [54]) carrying a 
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BamHl'EcoRl PCR fragment encompassing the entire MOBJ coding sequence plus flanking 
regulatory elements [15]. This strain is gai2 (has sub-optimal growth on galactose as a sole carbon 
source) and is Ura~ (requires uracil in growth medium). Ura* transformants were selected and 
allowed to sporulate. Alter germination, haploid mobJ::kan^ strains were selected using G41S. These 
5 cells have lost their chromosomal MOBJ, but its activity is complemented by the MOBl^ plasmid. 
These cells were mated with a second haploid strain ("CG379'' [55]) which was MOBltrpl GAL2 
and the mated diploid cells were tlien sporulated. Spores which were trpl GAL2 wobl::kan^ (cannot 
grow without tryptophan, can grow on galactose, G418 resistant) were selected for G418 resistance 
and growth on galactose medium. One which was mating type a was designated MGY66 and had the 
10 following relevant genotype AdATsi mobl::kan^ tip! GAL ura3 pURAS-MOBJ, MGY66 is a suitable 
starting cell for use with the invention, and its overall construction is shown in Figure I. 

As a further development, shown in Figure 2, the FEP4 gene of this strain was knocked out and 
replaced with a LEU2 cassette [56]. The resulting strain is referred to as "MGY70" and is A£4Ta 
mobl::kan^ tipl GAL pep4::LEU2 loaS- pURA3~MOBL The PEP4 gene encodes an aspartyl 
15 protease ("saccharopepsin") which can degrade recombinantly-expressed proteins, but which is not 
essential for cell survival, and so its deletion can improve yields of stable recombinant proteins. 

Preparation of expression piasmids 

Starting with plasmid pESC-URA (Invitrogen™), a Pvul fragment was excised, which contains the 
divergent, conditional and galactose-inducible yeast GalI-10 promoters and yeast ADH and CYCl 
20 terminators. This fragment was used to replace a Pvul fragment of pRS424 [57] to give "pESC-424". 

An EcoKL-Spel fragment encompassing the MOBl coding sequence was made by PCR of yeast 
genomic DNA using the following primers: 

Fwd, with EcoRI site: CCCGAATTCATGTCTTTTCTACAAAAT (SEQ ID NO: 9) 

Rev, with Spel site: CCCACTAGTCTACCTATCCCTCAACTCC (SEQ ID NO: 10) 

25 The PCR fragment was cloned into the GAL 10 promoter of pESC-424 to give pESC'424-MOB L The 
same EcoRl site was then removed by infilling with FClenow DNA polymerase, to give "pESC-424- 
MOBJ-AEcoRl'\ Removal of this EcoRJ site allowed a unique EcoKl site to be later included in a 
poly linker 

A Bgfl-Xhol fragment containing a GST coding sequence, a thrombin cleavage site and a polylinker 
30 was made by PCR of pGEX-KG [58] and cloned beUveen BamHX and XJw\ sites of pESC-424- 
A40BJ'AEcoRl, to give the plasmid "pMGl" (Figures 3A & 3B). The polylinker site (Figure 4) can 
receive genes encoding proteins of interest for expression as GST-flisions. 

The plasmid pMH919 (Figure 1 lA) was prepared using similar methods known in the art. The 
polylinker site of pMH9l9 can receive genes encoding proteins of interest for expression as 6Hi3- 
35 fiisions. 
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Transformation to egress recombinant proteins (Figure 6) 

Plasmid pMGl is grown in E.coli and a pJasmid DNA miniprep is prepared. Separately, a gene 
encoding a heterologous protein of interest is prepared which, after restriction enzyme treatment, will 
have sticky ends that are compatible and in-frame with the polylinker site in pMGl. The two 
5 molecules are digested and ligated to give a plasmid encoding the protein of interest in the form of a 
GST-fiision protein. This plasmid ("pMGl-X") is transferred into MGY70 yeast by the lithium 
acetate protocol, and is then selected on a minimal medium lacking tryptophan. As MGY70 is trpj, 
only transformants survive. Next, the cells are grown on agar with uracil and Img/ml 5~fluororotic 
acid, which selects against URAS^ cells. Surviving cells are those which have lost the pURAS-MOBl 
1 0 plasmid, but which have retained pMG 1 -X as the sole source of MOBl. 

The final transformants can be grown in rich media {e.g. in YEP medium) without further selection. 
The cells require uracil to grow, but this is supplied by rich media. The cells can be frozen at this 
stage to provide long-term stocks e.g. freezing at-SO^^C in YEP medium with 20% glycerol. 

Expression of the heterologous fusion protein can be induced by switching on the pGAL promoters. 

1 5 Protein expression and purification 

Yeast cells of the invention contains a heterologous gene under the control of a pGAL promoter. The 
MOB] is also under the control of a pGAL promoter. This arrangement allows a vei^ high copy 
number of the pMG plasmid to be achieved prior to expression of the heterologous gene, thereby 
giving high expression levels. Furthennore, by keeping the heterologous gene in an "off state at this 
20 stage then any possible toxic effects of the heterologous gene are avoided. 

Cells need MOBl expression to survive. As the MOBl gene is under the control of a pGAL 
promoter, which is repressed when cells are grown on glucose, it would seem on paper that the cells 
would die when grown on glucose. As repression is not 100% efficient, however, there is a low-level 
basal expression from the pGAL promoters (Figure 5A)- This basal expression provides low^ levels of 
25 MOBl to the growing cells, allowing survival. Moreover, the absolute need for MOB! operates as a 
selection pressure to increase the copy number of pMGl. In the presence of glucose, therefore, the 
copy number of pMGl increases to high levels. 

When expression of the heterologous protein is desired, the cells are transferred to a galactose 
medium. The absence of glucose and presence of galactose removes repression of the pGAL 
30 promoters and expression of the heterologous protein is thus induced (Figure 5B). Furthermore, the 
recombinant gene is expressed at even higher levels because of the high copy number resulting from 
the pGAL-controlled MOBl selection. 

After induction, cells are grown and then harvested. The cell lysate is applied to a glutathione 
column, which retains the GST-fusion protein. .After washing, thrombin is added to the column. 
35 leading to elution of the cleaved heterologous protein in pure form. 
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Expression of murine TPL2 

This transformation/expression/purification process was followed for murine TPL2 protein. 

A pCDNA3 vector carrying the cDNA of the complete mouse TPL2 coding sequence was used as a 
PCR template to generate a DNA fragment suitable for cloning into pMGYl. The PGR forwards 
5 primer -included the first 18 coding bases of TPL2 preceded by a synthetic BamHl site. The BawIHl 
site was designed to so that the TPL2 sequence was in frame with the 3* end of the GST sequence of 
pMGl. The reverse primer had tlie last 18 bases of the negative strand in reverse 5'-3* orientation 
preceded by a synthetic Xhol site. The PCR product was prepared for digestion using the Wizard 
PCR Preps DNA Purification System. The PCR fragment and pMGl were digested with BamVH and 
10 XIjoJ restriction enzymes. The PCR fragment was again purified using the Wizard PCR Preps DNA 
Purification System. The digested vector was electrophoresed tlirough a 10% agarose TAE buffered 
gel. Linear plasmid was excised from the gel and purified from the agarose using a Geneclean Kit. 
Vector and PCR fragments were Ugated together by incubation together for 2h. Control ligations 
were done with no insert DNA. 

15 Ligation mixtures were transformed into E.coli DHlOb.Transformed E.coli were selected on L agar 
containing 20jig/ml ampicilHn + 20^g/ml nafcillin. Individual clones were colony purified by 
restreaking on amp+naf selective medium. Miniprep DNA of individual clones was prepared using 
the Wizard Plus Minipreps DNA Purification System. Miniprep DNA was digested with BamVil + 
XJ7o\ restriction enzymes to identify clones carrying the ~1 .6kb TPL2 coding sequence. 

20 The DNA of three potentially positive pMGl-TPL2 clones were transformed into S.cerevisiae 
MGY70 using the lithium acetate procedure. N4GY70 transformants with this TRPl plasmid were 
selected by growth at 30°C on minimal agar medium lacking tryptophan. Two individual 
transformant clones obtained from each miniprep DNA sample were colony purified by re-streaking 
on agar medium lacking tryptophan. A single colony from each of these plates was streaked onto 

25 minimal medium supplemented with 20p.g/ml uracil and Img/ml FOA. FOA plates were incubated 
for 2-3 days at 30**C. Single colonics were picked onto fresh FOA plates and grown for a further 2-3 
days. In these cells the covering plasmid in MGY70 that provided the essential MOBl gene had been 
replaced by the expression plasmid and its copy of MOBL From this point onwards these cells could 
be grown on rich medium with no further conditional selection. 

30 Examples of the resulting single colonies were next tested for protein expression. However, at this 
stage it was useful to test whether expression of the cloned gene in toxic as this iniluences the 
induction regime for inducible gene expression. Induction of toxic gene products is indicated by 
failure of the cells to grow on rich agar medium with 2% galactose as carbon source. Induction of the 
potential TPL2 clones was not toxic as judged by this simple test. 

35 Three potential isolates originating from three independent ligation events were tested for expression 
of TPL2. 50ml overnight cultures were grown at 30°C in rich, YEV, medium with 2% raffinose as 
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carbon source. The cultures were inoculated so that cell density after overnight growth was 
approximately 5x10^ /ml. The overnight cultures were used to inoculate 500mi of YEP medium 
supplemented with 2% galactose as carbon source and grown for 6-8h at 30**C. Cells from 50ml and 
450ml of culture were harvested by centrifugation, finozen rapidly on dry ice and stored at -80''C. The 
5 small pellets were used to check for induced expression of TPL2 while the larger pellets were held in 
reserve for preparation of Tpl2 for experimental use. 

Small pellets were resuspended in 400^1 of lysis buffer (50mM Tris-HCl pH 7.5, 250mM NaCl, 1% 
Nonidet P40, 10% gl3'cerol, 4mM dithiothreitol, 200fig/ml sodium orthovanadate, lOmM NaF, 
50mM glycerol-2-phosphate, ImM PMSF, 'Complete' protease inhibitor (Roche™)). For cell lysis, 

10 glass beads, 0.5mm diameter, were added to the meniscus in 2ml screw cap tubes which were then 
shaken three times lOsec in a RiboLyser apparatus (Hybaid™). Cell lysate was recovered by piecing 
the base of the tube and followed by centrifugation inside a larger tube. Cell debris and insoluble 
. material was removed by 2x15 min centrifugation at 13000 rpm in a refrigerated micro centrifuge. 
The cleared lysate was added to 50f^il of glutathione sepharose beads which had been pre-equilibrated 

15 in 250mMNaCl, 50mM Tris-HCl pH 7.5, 0.2% Nonidet P40. The beads were gently mixed with the 
lysate on a rotor at 4°C for l-2h. The beads were washed 5x with 250mM NaCl, 50mM Tris-HCl pH 
7.5, 0.2% Nonidet P40, 4mM dithiothreitol. Proteins bound to the glutathione sepharose beads were 
analysed by SDS-polyacylamide gel electrophoresis. Protein bands were visualised by staining with 
coomassie blue (Figure 10). 

20 Large cell pellets were resuspended in lysis buffer (approximately lOml/lg cells). Cells were lysed 
with a French pressure cell operating at 20000psi. Cleared lysates were made by centrifiigation at 
ISOOOg for 2x20 min at 4°C. Large scale affinity purification of GST-TPL2 was essentially as 
described above except that appropriately increased amounts of reagents were used. 

In contrast to the successful expression of TPL2 using the system of the invention, attempts to 
25 express the protein in E.coli using the pGEX-4t and pET28 plasmids failed. The attempts used the 
fiill length protein as well as deletion derivatives lacking the N-terminal 30 residues and/or the 
C-terminal 70 residues (an oncogenic form). The kinase domain on its own was also tested. In all 
cases, however, any product which was seen (very little) was heavily degraded, inactive, insoluble or 
aggregated and was thus of limited use. 

30 Expression was also attempted without success using the Invitrogen^*^ DES system using the 
pMT/V5-His vector and S2 Drosophila cells. 

GST-Tpl2 from rat has also been expressed from a plasmid where CDC28 was used rather than 
MOBl as the essential gene (See Figure 37 and section regarding expression of two proteins below). 
Larger scale preparations of GST-Tpl2 yielded approximately 0.5mg of protein from 25g of induced 
35 cells (Figure 34). 
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[n addition to full length TpI2, three deletion derivatives have also been expressed. An N-terminal 
deletion which lacks 30 residues, a C-terminal deletion lacking 78 residues which mimics a naturally 
occurring oncogenic form of the protein, and an N- and C- terminal derivative combines both of 
these deletions (Figure 33). 

5 As Tpl2 and pi 05 interact in vivo, one test of the functionality of the proteins produced in yeast was 
to test for their interaction in vitro (Figure 35). Glutathione sepharose beads loaded with GST-Tpl2, 
GST, or GST-PLKA (see Figure 31) were mixed with 6His-pl05 that had been eluted from a nickel 
sepharose column (see Figure 33), Lane 3 of Figure 35 shows that 6His-pl05 was retained by the 
GST.TPL2 beads but not by beads caiTying GST (lane 5) or GST-PLKA (lane 1). Thus pi 05 and 
1 0 Tpl2 produced in this yeast system are able to interact in vitro as they do in vivo. 

Expression of other proteins 

Essentially similar procedures were used to produce GST-tagged S.cerevisiae Cdcl6, Bfal, Bub2, 
Teml and three deletion derivatives of Clh6 that contain the cyclin box domain. With Bfal and the 
Clb6 deletions, over-expression of the expressed proteins was toxic and reduced cell growth during 
15 the galactose induction period. To compensate for this, 500ml of overnight culture of these cells in 
YEP + 2% raffinose was used to inoculate a further 1 litre of YEP medium with a final concentration 
of 2% galactose. Induction then proceeded for 3-4h before harvesting. 

The MOBl expression system of the invention has been used to express fiill size Bfal (Figure 7), 
Bub2 (Figure 8), Ltel (Figure 9), Teml, Cla4, Netl, Nudl, Dbf20, Spol2 (Figure 14), wild type and 

20 kinase-dead Cdc 1 5, TPL2 (Figure 1 0), an oncogenic C-terminally deleted TPL2, TPL2 deleted for 30 
N-terminal residues, TPL2 deleted for both 30 N-tenuinal and 70 C-terminal residues, a kinase dead 
mutant of TPL2, the SARS virus Nspl3 putative mRNA cap-1 methyltransferase (e.g. Figure 30 
which shows 6Hjs-tagged SARS virus Nspl3 methyl transferase) and three deletion derivatives of 
CIb6, All of these proteins have long histories of being difficult or impossible to produce in other 

25 systems but all of them give a GST-fusion product using the MOBl system of the invention. 

The following mammalian proteins have also been expressed in yeast using the method of the 
invention: GST-HBPl, a histone binding protein from mouse (Figure 30); GST-fusions with 
fragments of the polo domain of human PLKl kinase (Figure 31); 6His- and GST-tagged mouse 
kinesin KIF2C (Figure 32); 6His- and GST- tagged CHO kinesin MCAK (Figure 32); rat GST-TPL2, 
30 a kinase involved in the regulation of the immune and inflammatory responses (Figure 33); human 
6His-pl05, a precursor of the NFKB transcription factor and regulator of TPL2 (Figure 33); and 
human Abin2, a protein which interacts with Tpl2 (Figure 33). 

The Mitotic Exit Networfc 

The Mitotic Exit Network (iVLEN) of S. cerevisiae controls the final phase of mitotis. The activity of 
35 the MEN is governed by a small GTPase called Teml which in turn is negatively regulated by a two 
component GTPase activator protein (GAP) formed from the Bfal and Bub2 proteins. Positive 
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regulation of Tern 1 is thought to be provided by Ltel, a putative nucleotide exchange factor whose 
activity appears to be influenced by the kinase CIa4. Teml determines the activity of a kinase 
cascade comprising Cdcl5 and Dbf2 and its co factor, Mobl. Dbf20 is a homologue of Dbf2. 
Downstream eflFectors of DbfZ/Mobl include the protein phosphatase Cdcl4. Cdcl4 is partly 
5 regulated by combining with Net! in an inaccessible form in the nucleolus. DbfZ/N^Iobl may also 
affect the activity of the protein degradation pathway specified by the ubiquitin ligasc, APC 
complex.Ltel is a large yeast protein (>1400 amino acids). It could not be expressed as a GST-fusion 
protein using either the pGEX-KG E.coli expression system or the pBacPak bacuiovirus system. In 
contrast, expression using the MOBl system of the invention gave high-level expression of the 
1 0 fusion protein in soluble form (Figure 9). 

Teml is a small Ras-like GTP-binding protein in the regulatory cascade of the mitotic exit network 
[34,59]. Expression in E. coli was attempted with a variety of vectors: pGEX-KG (GST-fusion) and 
pET28 (hexahistidine tag) did not give usefiil expression although small quantities of MBP-Teml 
were obtained from a pMAL-c2X vector 4^34]. Expression of N-terminal fragments (amino acids 
15 1-228 or 1-190) and of a Q79L mutant were also tested in various E.coli vectors, with no success. A 
hexahistidine fusion was tested without success in P.pastoris using the pPICZB vector, and the 
pBakPakS GST-fusion system also failed in bacuiovirus. In contrast, expression using the iVIOBl 
system of the invention gave high-level expression of the GST-Teml fusion protein in soluble form. 

Bub2 is part of a GTPase-activating protein complex involved in the mitotic exit network [34]. 

20 Expression of Bub2 was attempted in E.coli using the vectors pGEX-KG, plv^1AL-c2X and pET28 but 
only the GST-fusion was expressed and this was with large amounts of E. coli GroEL chaperone 
protein. Expression of fragments (amino acids 36-258) and of a GST-Bub2-His6 protein were also 
tested in various E.coli vectors, with no success. The pFICZoiA vector failed in P.pastoris, as did the 
pBakPakS and pBAC4X vectors in bacuiovirus. In contrast, expression using the MOBl system of 

25 the invention gave high-level expression of the GST-Bub2 fusion protein in soluble form (Figure 8). 

Bfal is the other half of the GTPase-activating protein complex (Bfal/Bub2) [34]. Expression of 
Bfal was attempted in E.coli using the vectors pGEX-KG, pGEX-His and pMAL-2c. Only MBP- 
fusion proteins could be expressed successfully. The pPICZB vector failed in P.pastoris, as did the 
pBakPakS vector in bacuiovirus. In contrast, expression using the MOBl system of the invention 
30 gave high-level expression of the GST-Bfa fusion protein in soluble form (Figure 7). GST-Nspl3 
expressed from pGEX-6P-2 in E. coli was insoluble but soluble GST-Nspl3 was obtained using the 
MOBl system. After cleavage of the fusion protein with human rhinovirus protease (PreScission 
Protease) yields were approximately Img Nspl3 /litre of induced cells. 

Figure 14 shows glutathione sepharose affinity purification of GST-Teml and its negative regulators 
35 GST-Bfa 1 and GST-Bub2. Bub2 has sequence homology with canonical GTPase activating proteins 
(GAPs) but is only active as a GAP when associated with Bfal. 
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Ltel has been expressed as either a GST- or 6His- fusion protein from either pMGl or pMH9l9- 
Ltel's putative regulatory kinase CIa4 has also been expressed as a GST- fusion protein. These 
proteins have been purified by affinity chromatography (Figure 1 5 A). In in vitro kinase assays, Cla4 
is able to phosphorylate 6His-Ltel, as judged by both the incorporation of radioactive label from y^^P 
5 ATP and, with excess ATP, by the decrease in electrophoretic mobility typical of modified proteins 
(Figure 15B). 

The putative nucleotide exchange activity of Ltel was confirmed in in vitro assays, which monitored 
the loss of radiolabelled GDP from the Teml/Bfal complex. In this assay, addition of Ltel 
accelerated the loss of GDP consistent with the activity of an exchange factor (Figure 15C). Thus, the 
10 recombinant 6His-Ltel produced in yeast displayed its predicted biochemical activity in vitro. 

The kinase Cdcl5 is the downstream effector of Teml. Wild type and kinase dead (K54L) forms of 
GST-Cdcl5 (Figure 16A) have been produced using the expression system of the invention. Figure 
16B shows that wild type GST-Cdcl5 phosphorylated GST-Mobl, GST-MobKDbf2 N305A, and 
the artificial substrate, myelin basic protein. The kinase dead form of GST-Mobl+Dbf2N305A was 
15 used as a substrate here to eliminate additional phosphorylation events produced by this second 
kinase. GST-Cdcl5 with a K54L mutation in the kinase site was unable to phosphorylate any of 
these substrates. Thus, Cdcl5 can be prepared using the expression system and displays the 
biochemical I}' appropriate activities w vitro. 

The GST-Mob 1/Dbf2 kinase dead complex mentioned above was produced by a variant of plasmid 
20 pMGl which was reconfigured to express GST-MOBI from the GALJ-JO promoter rather than the 
native MOBl (Figure LIB). This was possible because GST-MOBI is still able to complement and 
maintain the viability of a Amobl strain. Untagged Dbf2 was expressed from the other side of the 
GAL] '10 promoter (Figure I IB). Because of the stoichometric binding of Dbf2 with Mobl it was 
possible to prepare untagged Dbf2 by co-purification with GST-Mobl. Wild type (wt), N305A 
25 kinase dead (kd), and hyperactive forms of Db£2 were prepared in this way (Figure 17). 

The kinase activity of GST-MobH- wild type and mutant forms of Dbf2 was examined. Both wild 
type and hyperactive kinases were able to phosphorylate the artificial substrate, Histone HI (Figure 
17C), although phosphorylation was more efficient with the hyperactivated form of DbfZ. In 
addition, wild type and hyperactive GST-Mob l-rDbf2 displayed autophosphorylation (Figure 17C) 
30 while the kinase dead form did not (Figure 19). 

Furthermore, when GST-Mob I +wild tyj>e DbfZ was phosphorylated by Cdcl5, then Dbf2 kinase 
activity towards Histone HI was increased (Figure 16C). This is in agreement with earlier data 
obtained by different means and so indicates that properly functional Mobl+Dbf2 complex is 
produced by the yeast expression system of the invention. 
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The natural substrates of Mobl+Dbf2 kinase have not previously been reported. However these 
results show that this kinase has activity in vitro towards components of the APC ubiquitin Ugase 
complex (Figure 19 ) and to the downstream 3V1EN effector, Cdcl4 (Figure 20). 

GST-Apcl, GST-Cdcl6 and GST-23 were individually prepared using the yeast expression system 
5 (Figure 18). GST-Apcl and GST-Cdcl6 were both phosphorylated by GST-Mob I +wild type Db£2 
but GST-Cdc23 was not (Figure 19). Autophosphorylation of GST-Mob 1+wildtype Dbf2 was also 
clearly seen. In contrast, control GST-Mobl+ kinase' dead DbfZ was unable to phosphorylate any of 
these substrates or undergo autophosphorylation. 

The above data therefore show that a complex of GST-Mob 1 with wild type and mutant forms of 
10 Dbf2 kinase can be purified using the yeast expression system of the invention and that these 
complexes display the appropriate biochemical activities in vitro. 

Odd 4 is known to be a phosphatase and effector of several events at the end of mitotic exit. 
GST-Cdcl4 was produced in the yeast expression system and proved to be a good substrate for 
GST-Mob 1 kinase activity (Figure 20). Deletion and point mutant forms of GST-Cdcl4 were 
15 produced to map the sites of in vitro phosphorylation by GST-Mobl+Dbf2. By using four deletion 
derivatives phosphoiylation was mapped to the C-terminal region of Cdcl4 (Figure 20). Point 
mutations at several putative phosphorylation sites in these region of the purified GST-Cdcl4 further 
localised the amino acids subject to Mobl/Dbf2 kinase activity (Figure 20B). 

The functionality of these forms of Cdcl4 was assayed in vitro by using the chromogenic 
20 phosphatase substrate, p-nitrophenyl phosphate. Phosphatase activity on p-nitrophenyl phosphate can 
be detected spectrophotometrically by an increase in absorbance at 410nm. Figure 21 shows the 
phosphatase activity of full length, wild type GST-Cdcl4. The relative in vitro phosphatase activity 
of wild type GST-Cdcl4 and several multiple point mutant derivatives are presented in Figure 22, 

Finally, Cdcl4 activity in vivo is blocked by interaction with the nucleolar protein Netl. GST-Netl 
25 was produced using the expression system (Figure 23 A) and tested for its effects on Cdcl4 activity. 
The addition of GST-Netl clearly reduced the in vitro phosphatase activity of GST-Cdcl4 (Figure 
13). Thus, GST-Cdcl4 produced with the yeast expression system has the appropriate phosphatase 
activity in vitro and, as in vivo, it can be negatively regulated GST-Netl. 

Further yeast proteins 

30 PP2A (S. cerevisiae Protein phosphatase 2A) is a multifunctional protein phosphatase. In budding 
yeast the Tpdl subunit acts as a scaffold to two alternative enzymatic subunits, Pph21 or Pph22, and 
one of two alternative regulatory subunits, Cdc55 or Rtsl. All five subunits can be expressed as 
GST- fusion proteins in the yeast expression system of the invention (Figure 24). When GST-Cdc55 
was prepared from yeast it was active as judged by its ability to use p-nitrophenyl phosphate as a 

35 substrate (see above). The raw data for this activity showing an increase in absorbance of the in vitj'o 
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reaction mixture at 410nm are presented in Figure 25. In the preparation of GST-Cdc55 sufficient 
amounts of endogenous PKA components were co-purified to permit activity. 

Clb6 {S. cerevisiae) is one of nine cyclin regulators of Cdc28, the major budding yeast cell cycle 
regulatory kinase. Three deletion derivatives of Clb6 expressing the so-called cyclin box were 
5 expressed as GST-fusion proteins (Figure 26). 

Rgdl (S, cerevisiae) is a GTPase activating protein for the GTPase Rho. GST-Rgdl was expressed 
from plasmid pMGl in the MGY70 expression strain (Figure 27). 

Lrbc4 {S. cerevisiae) is an E2 ubiquitin conjugating enzyme which acts with the APC complex to 
ubiquitinate proteins and so direct them for protein degradation. A large scale preparation of 
10 GST-Ubc4 was undertaken to quantitate the yield of expressed protein. Figure 28 shows the 
GST-lJbc4 eluted with reduced glutathione from a glutathione-sepharose column. It also shows that 
less than 5% of material was retained by the purification matrix after elution. 5nig GST-Ubc4 was 
prepared from 25g of induced ceils. 

Plol {ScJiizosaccharomyces pombe) is a multifunctional regulatory kinase that acts in the cell cycle. 
15 Plol is a member of the Polo group of kinases. Plol was expressed in S.cerevisiae MGY70 as a 
GST-fusion protein and displayed in vino kinase activity towards myelin basic protein (MBP) 
(Figure 29). 

Optimisation of expression - Galactose requirement for induction of expression 
Expression of recombinant genes usign pMGl is induced by growth in rich medium with galactose 
20 as carbon source. In routine yeast culture carbon sources are arbitrarily provided at 2%. In larger 
scale preparations considerable amounts of galactose might be used. Therefore, the minimum level of 
galactose actually required for induction was determined. Also, as the costs of this ingredient can 
vary by approximately five fold, cultures were tested whether there was any appreciable difference 
between the cheapest and most expensive fonns of galactose. 

25 An expression strain was constmcted from the standard expression host MGY70 containing a 
derivative of pMGl expressing S, cerevisiae Ubc4 as a GST-fusion protein. Figure 12 compares the 
yields of GST-Ubc4 when expression was induced with 2%, 1%, 0.5% or 0.2% galactose. The 
experiment also compared the efficacy of galactose from two manufacturers differing in price by 
6-fold. The results show that 1% galactose from either source is sufficient for induction. Although 

30 yields with 0.5% of the more expensive galactose are slightly higher than with the cheaper galactose, 
it less expensive to use 1% of the cheaper galactose as the routine means of inducing expression. 
Thus while the more expensive galactose may be more appropriate for pharmaceutical preparation to 
ensure the highest levels of purity are maintained in accordence with good manufacturing practice, 
the cheaper galactose may be used in experimental conditions with no detrimental effects to the 

35 results obtained. 
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Optimisation of expression - Use of glucose prior to induction. 

The expression system can include a mechanism by which copy number of the expression plasmid is 
increased to compensate for the effect of glucose in reducing the expression of the MOB! selection 
gene from the GALl-lO promoter. This mechanism was demonstrated in two ways. 

First, glucose was shown to increase the plasmid copy number when the selection gene is expressed 
from GALl-10 promoter. The copy number of two plasniids of comparable sizes was assessed where 
expression of the selective MOB! gene was controlled either by the GALIO promoter or by the 
natural MOBl promoter. 10* yeast cells carrying one plasmid or the other were grown in rich 
medium containing 1% glucose. Relative plasmid numbers were quantified by extracting DNA and 
performing tmnsfonnations of competent E. coli DH5 with equal volumes of plasmid preparations 
from the two types of 3'east. 



Plasmid MOBl gene expressed from 


Yield E. coli transformants 


MOBl promoter 


565 


GALIO promoter 


1105 



The table shows that when MOBl is expressed from the GALI~10 promoter there is an 
approximately two fold increase in plasmid copy number. This is the result expected if glucose 
repression of the GALl-lO promoter limited the supply of the expression of essential Mobl protein 
and forced a compensatory increase in copy number. 

A second assay directly determined the effect of glucose expression of a cloned gene carried by 
pMGl. An expression strain was constructed from the standard expression host MGY70 containing a 
derivative of pMGl expressing mouse TPL2 as a GST-fusion protein. Prior to induction of 
expression by growth in medium containing 1% galactose, overnight 'precultures' were grown in 1% 
sucrose plus glucose at 1%, 0.5%, 0.2%, 0.05% or 0%. After 6h induction in 1% galactose medium, 
GST-TPL2 was prepared (Figure 13). The yield of GST-TPL2 was greatest when 0.05% glucose was 
included in the preculture. Greater amounts of glucose were less effective, possibly because residual 
amounts might remain in the induction culture and antagonise the subsequent galactose induced 
activation of the promoter. Therefore the invention only requires very low levels of glucose 

for induction of expression, thus reducing costs. 

Hetero'oiigomers 

Although MOBl has been used as the selection essential gene for all the work described above this 
section shows that, by employing a second essential gene for selection, a yeast expression sj^stem has 
been constructed to express two recombinant proteins simultaneously from two expression plasmids. 
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One class of expression plasmid includes all the MOB/TRF J -based vectors described above and in 
Figures 3 and 1 1 . The second class of expression plasmids utilise the essential gene CDC28 for 
selection, rather than MOB], and have HIS3 as an auxotrophic marker instead of TRPl. pMH925 is 
designed to produce proteins with a GST tag and pMH927 is designed to make 6His-tagged products 
5 (Figure 36A&B). The two classes of plasmids both use the divergent GALl-10 promoter and can 
express either GST- or 6His- fusion proteins. The expression cells have chromosomal deletions of 
essential MOBl and CDC28 genes which are made by the methods described above. Thej' are kept 
alive by a third, covering plasmid which has a URA3 selective marker and which expresses both 
MOBl and CDC28 genes from their endogenous promoters. 

10 Use of this system is essentially the same as the single expression system. Coding sequences are 
cloned into the two types of expression vectors. The vectors are transformed into the expression 
strain selecting for trytophan and histidine prototrophy. The transfoniiants are grown on medium 
containing 5-fluoro-orotic acid to select for loss of the 'covering' URA3 MOBl CDC28 plasmid. The 
loss of the covering plasmid produces a strain can-ying two different expression plasmids whose 

15 presence is maintained by selection for their essential MOBl and CDC28 genes. 

An example of the use of this system is shown where two proteins are co-expressed and, because of 
their known affinity for each other, they also co-purify (Figure 37). A pMH925, CDC28' based 
plasmid encoding GST-TPL2 was co-expressed with either a pIvIH919 derivative expressing 6His- 
pl05 or the 'empty' pMH9l9 vector expressing only the 6His affinity tag. Additional control cells 

20 expressed the GST affinity tag from pMH925 with a pMH919 derivative expressing 6His-pl05. 
Lysates were prepared from these cells and GST- and 6His-tagged proteins were recovered by 
affinity purification with both glutathione sepharose and nickel sepharose. This experiment shows 
that GST-Tpl2 can be expressed from plasmids relying on a second essential gene, CDC28, for self 
selection (lane 1). GST is also expressed from the C£>C2S-based vector which was co-expressed with 

25 6His-pl05 (lane 3). As expected, the 6His-pl05 that was co expressed with GST was not recovered 
using glutathione sepharose in lane 3, but it was seen using nickel sepharose purification (lane 6). 
Thus two different proteins can be co-expressed. 

Co-expression was also seen in extracts from cells encoding GST-Tpl2 and 6His-pl05. GST-Tpl2 
was recovered after purification with glutathione sepharose (lane 2) while 6His-pl05 was purified 
30 from the same cells with nickel sepharose. Importantly, 6His-pl05 also co-purified with the GST- 
Tpl2 on glutathione sepharose (lane 2) but not with GST alone (lane 3). This indicates specific co- 
purification of 6His-pl05 with GST-Tpl2. Similarly, GST-Tpl2 co-purified with 6H!S-pl05 on nickel 
sepharose (lane 5) but not with the 6His tag alone (lane 4). Thus the GST-Tpl2 and 6His-pl05 are 
co-expressed in forms that are able to interact and so co-purify. 

35 In fijrther examples, yeasts are made with chromosomal deletions of both MOBl and CDC33. To 
complement the deletions, yeast are kept alive by a 'covering' plasmid expressing both MOBl and 
CDC33 and carrying a URA3 selective marker. To insert the heterologous gene products, one 
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plasmid is pMGl as described above and the other is a similar plasinid where (a) MOB I is replaced 
by CDC33 and (b) conditional selective marker HIS3 replaces TRPl. To allow separate purification, 
the second plasmid uses an epitope tag, a hexahistidinyl tag or no tag rather than a GST lEusion. 

Heterologous sequences are cloned into the two expression plasmids. The two plasmids are 
5 co-transformed into a yeast host, selecting for Trp"^ and His^ prototrophy. Cells that have lost the 
URA3 -covering plasmid are selected on FOA to give a cell capable of expressing two different 
proteins. 

In related work, GST-Mobl was expressed with untagged Dbf2 in mo6/-deleted cells. Dbf2 is a 
kinase and Mobl is an accessoiy protein required for activity. The divergent GALl-10 promoter 
10 expressed GST-Mobl in one direction and untagged Dbf2 in the other. Purification of GST-Mobl on 
glutathione sepharose also yielded approximately equimolar amounts of untagged DbfZ, 
demonstrating how hetero-oligomers can be purified. 

Expression in Escherichia coli 

An E.coli BL21 derivative with good induction and protein stability' characteristics is selected. 

1 5 An essential gene for chromosomal deletion is chosen. 

A covering plasmid based on pACYCl84 is prepared, including: (a) the essential gene, prepared by 
PGR from E.coii genomic DNA and including its natural promoter and regulatory sequences: (b) the 
conditionaliy-lethal sacB marker to allow counterselection during confirmation of chromosomal 
deletion and during plasmid shuffling; (c) a PI 5 A replication origin; (d) a chloramphenicol selection 
20 marker. The plasmid is transformed into E.coli in preparation for deletion of the essential 
chromosomal gene. 

After introduction of the covering plasmid, the chromosomal copy of the essential gene is replaced 
with a drug resistance marker using the methods described in reference 47 or 48. The drug resistance 
marker allows inheritance of the modified gene to be followed. Confirmation that the essential gene 
25 is provided by the covering plasmid and not by the chromosome can be provided by attempting to 
grow a bacterium in sucrose-based medium. 

An expression plasmid based on pETDuet (Novagen™) is prepared, including: (a) the essential gene; 
(b) a mammalian, viral or other eukaryotic gene of interest; (c) two multiple cloning sites adjacent to 
tandem T^lac inducible promoters, with one MCS including a hexa-His tag: (d) a colEl replication 
30 origin, which is compatible with the P15A origin used in the covering plasmid; and (c) an ainpR 
gene, which allows the plasmid to be distinguished from the covering plasmid. The two genes (a) and 
(b) are under the control of the two T71ac promoters. A simpler system uses a normal pET or pGEX 
vector, with only a single MCS for receiving the mammalian gene; the essential gene with its own 
promoter is first cloned into a non-MCS site. 
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The expression plasmid is transformed into the E.coli to give a bacterium carrying both the covering 
plasraid and the expression plasmid. 

Loss of the covering plasmid is then selected by growing bacteria on sucrose. This growtli stage can 
be preceded by a period of growth in the absence of chloramphenicol, in order to provide an 

5 opportunity for 'natural' loss of the covering plasmid. After the sucrose counterselection, loss of the 
covering plasmid is confirmed by checking for chloramphenicol sensitivity. After this confirmation 
there is no need for fiirther use of antibiotics during growth as the expression plasmid can be 
maintained by its providing the essential gene rather than by its ampR gene. The bacteria can thus be 
grown through several cultures in order to eliminate any trace of chloramphenicol, thereby giving an 

10 antibiotic-free preparation of bacteria which can be used to express the mammalian protein without 
antibiotic contamination. 

Bacteria are cultured and then induced under standard condition using IPTG. The mammalian protein 
is expressed as a GST fiision protein which is then purified using die appropriate affinity column. 
The native protein is released using thrombin cleavage to give a final purified product. 

15 In a fijrther development, the expression plasmid includes the oriV/Txih replicon system for copy 
number amplification, as disclosed in reference [60], 

It will be understood tliat the invention has been described by way of example only and modifications 
may be made whilst remaining within the scope and spirit of the invention. 
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